Ordered Fast Fourier Transforms on a Massively Parallel Hypercube Multiprocessor

نویسندگان

  • Charles Tong
  • Paul N. Swarztrauber
چکیده

We examine design alternatives for ordered FFT algorithms on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end we combine the order and computational phases of the FFT and also use sequence to processor maps that reduce communication. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, "standard-order" and "A-order" which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 elements and the hypercube has P = 2 processors then a standard-order FFT can be implemented with d +r ⁄2+1 parallel transmissions. An Aorder sequence can be transformed with 2d −r ⁄2 parallel transmissions which is r −d +1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine. 1 Department of Computer Science, University of California at Los Angles, Los Angles, California 90024-1596. 2 National Center for Atmospheric Research, Boulder, Colorado 80307, which is sponsored by the National Science Foundation. 3 This work was supported by the NAS Systems Division via Cooperative Agreement NCC 2-387 between NASA and the University Space Research Association (USRA). It was performed while the authors were visiting the Research Institute for Advanced Computer Science (RIACS), NASA Ames Research Center, Moffett Field, CA 94035.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiprocessor FFTs

Several multiprocessor FFTs are developed in this paper for both vector multiprocessors with shared memory and the hypercube. Two FFTs for vector multiprocessors are given that compute an ordered transform and have a stride of one except for a single "link" step. Since multiple FFTs provide additional options for both vectorization and distribution we show that a single FFT can be performed in ...

متن کامل

Implementing 2-d and 3-d Discrete Hartley Transforms on a Massively Parallel Simd Mesh Computer

Discrete Hartley transform (DHT) is known to outperform fast Fourier transform (FFT) on sequential machines. Here we investigate parallel algorithms and implementations of twoand three-dimensional DHT in order to determine if the advantage of Hartley transforms over Fourier transforms carries over to parallel environment as well. Our extensive empirical study of the performances of DHT and FFT ...

متن کامل

Parallel Three-Dimensional Nonequispaced Fast Fourier Transforms and Their Application to Particle Simulation

In this paper we describe a parallel algorithm for calculating nonequispaced fast Fourier transforms on massively parallel distributed memory architectures. These algorithms are implemented in an open source software library called PNFFT. Furthermore, we derive a parallel fast algorithm for the computation of the Coulomb potentials and forces in a charged particle system, which is based on the ...

متن کامل

A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms

This paper presents a self-adapting parallel package for computing the Walsh-Hadamard transform (WHT), a prototypical fast signal transform, similar to the fast Fourier transform. Using a search over a space of mathematical formulas representing different algorithms to compute the WHT, the package finds the best parallel implementation on a given shared-memory multiprocessor. The search automat...

متن کامل

Optimal Matrix Transposition and Bit Reversal on Hypercubes: All-to-All Personalized Communication

In a hypercube multiprocessor with distributed memory, messages have a street address and an apartment number, i.e., a hypercube node address and a local memory address. Here we describe an optimal algorithm for performing the communication described by exchanging the bits of the node address with that of the local address. These exchanges occur typically in both matrix transposition and bit re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 12  شماره 

صفحات  -

تاریخ انتشار 1991